Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus
نویسندگان
چکیده
Sentiment analysis often relies on a semantic orientation lexicon of positive and negative words. A number of approaches have been proposed for creating such lexicons, but they tend to be computationally expensive, and usually rely on significant manual annotation and large corpora. Most of these methods use WordNet. In contrast, we propose a simple approach to generate a high-coverage semantic orientation lexicon, which includes both individual words and multi-word expressions, using only a Roget-like thesaurus and a handful of affixes. Further, the lexicon has properties that support the Polyanna Hypothesis. Using the General Inquirer as gold standard, we show that our lexicon has 14 percentage points more correct entries than the leading WordNet-based high-coverage lexicon (SentiWordNet). In an extrinsic evaluation, we obtain significantly higher performance in determining phrase polarity using our thesaurus-based lexicon than with any other. Additionally, we explore the use of visualization techniques to gain insight into the our algorithm beyond the evaluations mentioned above.
منابع مشابه
Generating Semantic Orientation Lexicon using Large Data and Thesaurus
We propose a novel method to construct semantic orientation lexicons using large data and a thesaurus. To deal with large data, we use Count-Min sketch to store the approximate counts of all word pairs in a bounded space of 8GB. We use a thesaurus (like Roget) to constrain near-synonymous words to have the same polarity. This framework can easily scale to any language with a thesaurus and a unz...
متن کاملMorpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning
Morpho-syntactic lexicons provide information about the morphological and syntactic roles of words in a language. Such lexicons are not available for all languages and even when available, their coverage can be limited. We present a graph-based semi-supervised learning method that uses the morphological, syntactic and semantic relations between words to automatically construct wide coverage lex...
متن کاملAcquisition of Lexical Resources from SNOMED for Medical Language Processing
Medical language processing depends on large-coverage, fine-grained specialized lexicons. The vast majority of existing electronic lexicons concern the English language; for other languages such as French, resources are scarce. In contrast, large medical thesauri exist in numerous languages, including French. Our goal was to study what kind of linguistic information could be extracted from thes...
متن کاملComputing Lexical Contrast
Knowing the degree of semantic contrast between words has widespread application in natural language processing, including machine translation, information retrieval, and dialogue systems. Manually-created lexicons focus on opposites, such as hot and cold. Opposites are of many kinds such as antipodals, complementaries, and gradable. However, existing lexicons often do not classify opposites in...
متن کاملNoun-phrase co-occurrence statistics for senti-automatic semantic lexicon construction
Generating semantic lexicons semiautomatically could be a great time saver, relative to creating them by hand. In this paper, we present an algorithm for extracting potential entries for a category from an on-line corpus, based upon a small set of exemplars. Our algorithm finds more correct terms and fewer incorrect ones than previous work in this area. Additionally, the entries that are genera...
متن کامل